Dying ReLU
ReLU returns 0 when pre-activation is negative, which is sometimes useful to have simplicity in the network.
But sometimes many neurons have negative values, which zeroes out most of the network activations.
It's solved by:
- Swish (SiLU), by using input in equation connection:
- GeLU, by using Gaussian distribution:
- Pairing ReLU with BatchNorm.